We've learned about vectors and their two-dimensional counterpart, matrices. Now we will learn about Dataframes, one of the main tools for data analysis with R! Matrix inputs were limited because all the data inside of the matrix had to be of the same data type (numerics, logicals, etc). With Dataframes we will be able to organize and mix data types to create a very powerful data structure tool!
R actually has built in DataFrames for quick reference to play around with! Check out the following dataframes that are built-in!
# Dataframe about states
state.x77
# US personal expense
USPersonalExpenditure
# Women
women
To get a list of all available built-in dataframes, use data()
data()
You'll notice the states dataframe was really big, we can use the head() and tail() functions to view the first and last 6 rows respectively. Let's take a look:
# Quick variable assignment to save typing
states <- state.x77
head(states)
tail(states)
We can use the str() to get the structure of a dataframe, which gives information on the structure of the dataframe and the data it contains, such as variable names and data types. We can use summary() to get a quick statistical summary of all the columns of a DataFrame, depending on the data, this may or may not be useful!
# Statistical summary of data
summary(states)
# Structure of Data
str(states)
A quick note some people write Dataframe as one word, but in R its more commonly written as two words: data frame. Not a very huge deal either way, but if someone writes DataFrame they may be referring to a Python/pandas DataFrame, so keep that in mind!
We can create data frames using the data.frame() function and pass vectors as arguments, which will then convert the vectors into columns of the data frame. Let's see a simple example:
# Some made up weather data
days <- c('mon','tue','wed','thu','fri')
temp <- c(22.2,21,23,24.3,25)
rain <- c(TRUE, TRUE, FALSE, FALSE, TRUE)
# Pass in the vectors:
df <- data.frame(days,temp,rain)
df
# Check structure
str(df)
summary(df)
That's it for the basics, up next we will learn about selection and indexing Data Frame elements!